causal approach
Interpretable Reward Redistribution in Reinforcement Learning: A Causal Approach
A major challenge in reinforcement learning is to determine which state-action pairs are responsible for future rewards that are delayed. Reward redistribution serves as a solution to re-assign credits for each time step from observed sequences. While the majority of current approaches construct the reward redistribution in an uninterpretable manner, we propose to explicitly model the contributions of state and action from a causal perspective, resulting in an interpretable reward redistribution and preserving policy invariance. In this paper, we start by studying the role of causal generative models in reward redistribution by characterizing the generation of Markovian rewards and trajectory-wise long-term return and further propose a framework, called Generative Return Decomposition (GRD), for policy optimization in delayed reward scenarios. Specifically, GRD first identifies the unobservable Markovian rewards and causal relations in the generative process.
Reviews: Equality of Opportunity in Classification: A Causal Approach
I acknowledge having read the rebuttal. Fairness is a complicated and an important matter. Due to the nature of the problem, there might not be a universal characterization of it, but if a criterion is proposed it should be followed by a compelling story and a reasonable explanation for why we should consider this criterion. This paper provides a new (causal) interpretation of equalized odds (EO), an associative measure that has been used as a framework to talk about discrimination in classification problems. The central point of the paper is to learn a fair classifier by constraining each of the three (causal) components of EO (i.e.
Council Post: Three Ways A Causal Approach Can Improve Trust In AI
Bernd Greifeneder is the CTO and Founder of Dynatrace, a software intelligence company that helps to simplify enterprise cloud complexity. IT, development and business departments are under more pressure than ever to innovate. However, this has led to applications becoming increasingly complex as organizations move to more dynamic, multicloud environments for greater agility. DevOps and SRE teams need to make sense of this complexity and optimize their services, but this drains the time you can devote to innovation. The move to cloud-native architectures is also making it harder for these teams to quickly identify vulnerabilities.
Causal Markov Decision Processes: Learning Good Interventions Efficiently
Lu, Yangyi, Meisami, Amirhossein, Tewari, Ambuj
We introduce causal Markov Decision Processes (C-MDPs), a new formalism for sequential decision making which combines the standard MDP formulation with causal structures over state transition and reward functions. Many contemporary and emerging application areas such as digital healthcare and digital marketing can benefit from modeling with C-MDPs due to the causal mechanisms underlying the relationship between interventions and states/rewards. We propose the causal upper confidence bound value iteration (C-UCBVI) algorithm that exploits the causal structure in C-MDPs and improves the performance of standard reinforcement learning algorithms that do not take causal knowledge into account. We prove that C-UCBVI satisfies an $\tilde{O}(HS\sqrt{ZT})$ regret bound, where $T$ is the the total time steps, $H$ is the episodic horizon, and $S$ is the cardinality of the state space. Notably, our regret bound does not scale with the size of actions/interventions ($A$), but only scales with a causal graph dependent quantity $Z$ which can be exponentially smaller than $A$. By extending C-UCBVI to the factored MDP setting, we propose the causal factored UCBVI (CF-UCBVI) algorithm, which further reduces the regret exponentially in terms of $S$. Furthermore, we show that RL algorithms for linear MDP problems can also be incorporated in C-MDPs. We empirically show the benefit of our causal approaches in various settings to validate our algorithms and theoretical results.
- North America > United States > Michigan (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > Middle East > Jordan (0.04)
Equality of Opportunity in Classification: A Causal Approach
Zhang, Junzhe, Bareinboim, Elias
The Equalized Odds (for short, EO) is one of the most popular measures of discrimination used in the supervised learning setting. It ascertains fairness through the balance of the misclassification rates (false positive and negative) across the protected groups -- e.g., in the context of law enforcement, an African-American defendant who would not commit a future crime will have an equal opportunity of being released, compared to a non-recidivating Caucasian defendant. Despite this noble goal, it has been acknowledged in the literature that statistical tests based on the EO are oblivious to the underlying causal mechanisms that generated the disparity in the first place (Hardt et al. 2016). This leads to a critical disconnect between statistical measures readable from the data and the meaning of discrimination in the legal system, where compelling evidence that the observed disparity is tied to a specific causal process deemed unfair by society is required to characterize discrimination. The goal of this paper is to develop a principled approach to connect the statistical disparities characterized by the EO and the underlying, elusive, and frequently unobserved, causal mechanisms that generated such inequality.